outcome
group died survived
control 39 11
treatment 26 14
outcome
group died survived
control 39 11
treatment 26 14
\[ \begin{align*} \hat{p}_T &= \frac{14}{40} = 0.35 \\ \hat{p}_C &= \frac{11}{50} = 0.22 \\ \hat{p}_T - \hat{p}_C &= 0.35 - 0.22 = 0.13 \end{align*} \]
To model what would happen if \(H_0\) were true:
Compute (repeatedly) a simulated difference in survival rates to get a null distribution.
To construct a confidence interval, we don’t need hypotheses. We just use the data.
outcome
group died survived
control 39 11
treatment 26 14
\[ \begin{align*} \hat{p}_T &= \frac{14}{40} = 0.35 \\ \hat{p}_C &= \frac{11}{50} = 0.22 \\ \hat{p}_T - \hat{p}_C &= 0.35 - 0.22 = 0.13 \end{align*} \]
For a 90% confidence interval, we need 5% in each tail.
Once we have a bootstrap distribution, we can take its standard
deviation using sd().
\[ SE_{\hat{p}_T - \hat{p}_C} \approx SE_{\hat{p}_{T, boot} - \hat{p}_{C, boot}} = 0.098 \]
Then use the generic formula to get a 95% confidence interval:
\[ \hat{p}_T - \hat{p}_C \pm 1.96 \cdot SE = 14/40 - 11/50 \pm 1.96 \cdot 0.098 = (-0.06, 0.32) \]
Fifty people attending a local flea market were recruited to participate. Subjects were ushered, one at a time, into one of three rooms by co-host Kari. She yawned (planting a yawn “seed”) as she ushered subjects into two of the rooms, and for the other room she did not yawn. The researchers decided in advance, with a random mechanism, which subjects went to which room. As time passed, the researchers watched to see which subjects yawned.
\[ H_0: p_1 - p_2 = 0 \\ H_A: p_1 - p_2 > 0 \]
| Seed observed | Seed not observed | Total | |
|---|---|---|---|
| Subject yawned | 11 | 3 | 14 |
| Did not yawn | 23 | 13 | 36 |
| Total | 34 | 16 | 50 |
\[ \begin{align} \hat{p}_1 &= 11/34 \approx 0.32 \\ \hat{p}_2 &= 3/16 \approx 0.19 \\ \hat{p}_1-\hat{p}_2 &= 11/34-3/16 \approx 0.136 = \mbox{test statistic} \\ \end{align} \]
To evaluate the statistical significance of the observed difference, we will investigate how large the difference in proportions tends to be just from the random assignment of response outcomes to the explanatory variable groups.
For any group of 50 people, about 14 will be yawners and 36 will be non-yawners, whether or not they have observed the yawn seed.
Conditions:
Independence (e.g., a randomized experiment)
Success-failure condition: at least 10 successes and 10 failures in each of the two groups.
\[ SE_{\hat{p}_1 - \hat{p}_2} = \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}} \]
For a confidence interval, we use the “best guess” for \(p_1\) and \(p_2\), and apply the generic formula:
\[ \begin{aligned} \text{observed statistic} \ &\pm \ z^{\star} \ \times \ SE \\ (\hat{p}_1 - \hat{p}_2) \ &\pm \ z^{\star} \times \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}} \end{aligned} \]
Recall the Yawning Study:
\[ \begin{align} \hat{p}_1 &= 11/34 \approx 0.32 \\ \hat{p}_2 &= 3/16 \approx 0.19 \\ \hat{p}_1-\hat{p}_2 &= 11/34-3/16 \approx 0.136 \\ \end{align} \]
Are the conditions for the mathematical model satisfied?
Regardless of (1), compute \(SE_{\hat{p}_1-\hat{p}_2}\) using the mathematical model.
Use the generic formula to obtain a 95% confidence interval for \(p_1 - p_2\).
prop.test command can be used for testing the null
that the proportions (probabilities of success) in two groups are the
same, or that they equal certain given values.prop.test also gives confidence intervals.
Warning in prop.test(x = c(11, 3), n = c(34, 16), alternative = "greater"): Chi-squared
approximation may be incorrect
2-sample test for equality of proportions with continuity correction
data: c(11, 3) out of c(34, 16)
X-squared = 0.43786, df = 1, p-value = 0.2541
alternative hypothesis: greater
95 percent confidence interval:
-0.1177157 1.0000000
sample estimates:
prop 1 prop 2
0.3235294 0.1875000
Warning in prop.test(x = c(11, 3), n = c(34, 16)): Chi-squared approximation may be incorrect
2-sample test for equality of proportions with continuity correction
data: c(11, 3) out of c(34, 16)
X-squared = 0.43786, df = 1, p-value = 0.5082
alternative hypothesis: two.sided
95 percent confidence interval:
-0.1575227 0.4295815
sample estimates:
prop 1 prop 2
0.3235294 0.1875000
2-sample test for equality of proportions with continuity correction
data: c(14, 11) out of c(40, 50)
X-squared = 1.2801, df = 1, p-value = 0.1289
alternative hypothesis: greater
95 percent confidence interval:
-0.04957706 1.00000000
sample estimates:
prop 1 prop 2
0.35 0.22
2-sample test for equality of proportions with continuity correction
data: c(14, 11) out of c(40, 50)
X-squared = 1.2801, df = 1, p-value = 0.2579
alternative hypothesis: two.sided
95 percent confidence interval:
-0.07966886 0.33966886
sample estimates:
prop 1 prop 2
0.35 0.22
2-sample test for equality of proportions with continuity correction
data: c(14, 11) out of c(40, 50)
X-squared = 1.2801, df = 1, p-value = 0.2579
alternative hypothesis: two.sided
90 percent confidence interval:
-0.04957706 0.30957706
sample estimates:
prop 1 prop 2
0.35 0.22
Suppose we test \(H_0: p_1 - p_2 = 0\) versus \(H_A: p_1 - p_2 \neq 0\), and we also construct a confidence interval for \(p_1 - p_2\).